Skip to content

Resampling performance improvement and sparse aggregation columns support#3062

Merged
IvoDD merged 10 commits into
masterfrom
sparse-resampling-support
Jun 4, 2026
Merged

Resampling performance improvement and sparse aggregation columns support#3062
IvoDD merged 10 commits into
masterfrom
sparse-resampling-support

Conversation

@IvoDD
Copy link
Copy Markdown
Collaborator

@IvoDD IvoDD commented Apr 30, 2026

Reference Issues/PRs

Monday ref: 11679866800

Depends on PRs #3091 and #3110

Issues

  • There is complicated bucket hopping logic in three places: generate_output_index_column, generate_resampling_output_column, SortedAggregator::aggregate
  • The bucket hopping logic involves many branches with loads of checks

Changes (split per commit for easier review)

  1. Adds C++ benchmarks which measures the CPU intensive part of resampling
  2. Pure move of the generate_output_index_column to sorted_aggregation.cpp.
    • This way all bucket hopping logic is in one place.
  3. Construct a ResampleMapping in generate_output_index_column and use it directly in other methods.
    • ResampleMapping just has a mapping from output_row to (start_column_index, start_column_offset), (end_column_index, end_column_offset).
    • Resolves the 3 places with similar logic.
    • Makes the implementation of sparse aggregation easier.
  4. Use galloping search in generate_output_index_column to skip past all rows in a single bucket at once.
    • Index column construction was the bottleneck: aggregation vectorises well but index iteration does not.
    • Changes complexity from O(num_input_rows + num_buckets) to O(num_buckets × log(rows_per_bucket)).
    • Always ≤ O(num_input_rows + num_buckets) even when num_buckets ≥ num_input_rows.
  5. Preallocate the output index column to min(num_buckets, num_input_rows) instead of num_buckets.
    • Galloping search has a higher constant than linear scan and regresses at low rows per bucket.
    • Slightly improves the case where most buckets are empty due to smaller allocation.
  6. Use a runtime heuristic to choose between linear scan and galloping search.
    • Linear scan is faster below ~32 rows/bucket (because of smaller constant and better branch prediction); galloping search is faster above.
    • Threshold determined empirically from benchmarks at intermediate bucket counts. Extra benchmarking was done with more parametrization of the existing benchmark. Not kept in PR to avoid a huge amount of benchmarking code.
    • Recovers the Dense-100k and Empty regressions from commit 3 while retaining all gains elsewhere.
  7. Implement sparse resampling.
    • Small change made straightforward by the ResampleMapping from commit 2.
    • Minimal overhead for the dense case.

Resample benchmark timings

BM_resample/<rows_per_seg>/<num_segs>/<num_buckets>/<num_cols>. Total rows ~1M.
Source: cpp/arcticdb/processing/test/benchmark_resample.cpp. Times in ms, --benchmark_min_time=2s.

Regime Args rows/bucket Description
Dense-1k 100k × 10, 1k buckets ~1000 Many rows/bucket, single row-slice
Dense-100 100k × 10, 10k buckets ~100 Medium rows/bucket, single row-slice
Dense-10 100k × 10, 100k buckets ~10 Few rows/bucket, single row-slice
Spanning 2k × 500, 100 buckets ~10k Buckets span multiple row-slices
Empty 100k × 10, 10M buckets <1 Bucket smaller than row spacing; most empty

1 aggregation column

# Change D-1k D-100 D-10 Spanning Empty
0 Baseline 1.27 1.34 1.47 1.65 11.1
1 Code move 1.02 (−20%) 1.12 (−16%) 1.27 (−14%) 1.40 (−15%) 11.1 (0%)
2 ResampleMapping 1.02 (−20%) 1.12 (−16%) 1.32 (−10%) 1.40 (−15%) 11.8 (+6%)
3 Galloping search 0.059 (−95%) 0.385 (−71%) 2.94 (+100%) 0.285 (−83%) 21.9 (+97%)
4 Bounded allocation 0.058 (−95%) 0.396 (−70%) 2.91 (+98%) 0.291 (−82%) 21.5 (+94%)
5 Heuristic (lin/EUB) 0.059 (−95%) 0.383 (−71%) 1.27 (−14%) 0.293 (−82%) 11.5 (+4%)
6 Sparse-input support 0.068 (−95%) 0.449 (−66%) 1.28 (−13%) 0.296 (−82%) 11.5 (+4%)

100 aggregation columns

# Change D-1k D-100 D-10 Spanning Empty
0 Baseline 1.37 1.43 1.56 6.22 48.0
1 Code move 1.11 (−19%) 1.18 (−17%) 1.34 (−14%) 5.92 (−5%) 46.2 (−4%)
2 ResampleMapping 1.11 (−19%) 1.19 (−17%) 1.39 (−11%) 5.87 (−6%) 50.4 (+5%)
3 Galloping search 0.148 (−89%) 0.471 (−67%) 2.96 (+90%) 4.65 (−25%) 63.1 (+31%)
4 Bounded allocation 0.148 (−89%) 0.480 (−66%) 2.95 (+89%) 4.67 (−25%) 44.1 (−8%)
5 Heuristic (lin/EUB) 0.149 (−89%) 0.477 (−67%) 1.33 (−15%) 4.70 (−24%) 35.9 (−25%)
6 Sparse-input support 0.158 (−88%) 0.537 (−62%) 1.35 (−13%) 4.94 (−21%) 36.0 (−25%)

Deltas vs baseline (row 0).

Notes on benchmark results

  • Load average varied across runs so there are some artifacts in results like "Code move" improvements.
  • Galloping search improves the speed when there are more rows in a single bucket significantly. Thorough benchmarking showed exponential upper bound (EUB) becomes faster than linear search at ~32 rows per bucket. Hence we see some performance regressions in the 10 rows per bucket and in the mostly empty bucket cases.
  • Bounded allocation mostly helps the empty case as expected
  • Using the heuristic to choose between EUB and linear search helps when rows_per_bucket < 32. It is even more efficient than the baseline due to slightly better branch prediction (improved use of ARCTICDB_LIKELY and ARCTICDB_UNLIKELY).
  • Final state: every regime at or faster than baseline; Dense 1000 rows per bucket is the biggest winner with 20x improvement; Mostly empty bucket is the only usecase with no improvement and remains around baseline (+4%)

@IvoDD IvoDD changed the base branch from master to arrow-use-in-memory-storage-for-unit-tests April 30, 2026 10:17
@maxim-morozov maxim-morozov self-requested a review April 30, 2026 16:42
@IvoDD IvoDD force-pushed the arrow-use-in-memory-storage-for-unit-tests branch from 419c30a to 0de92a2 Compare May 5, 2026 14:11
Base automatically changed from arrow-use-in-memory-storage-for-unit-tests to master May 7, 2026 11:30
@IvoDD IvoDD force-pushed the sparse-resampling-support branch from a5ac868 to a9e8ee4 Compare May 11, 2026 09:18
@IvoDD IvoDD changed the base branch from master to binary-search-utils May 11, 2026 09:18
@IvoDD IvoDD force-pushed the sparse-resampling-support branch 2 times, most recently from 36122bc to 4231a4f Compare May 12, 2026 15:18
@IvoDD IvoDD force-pushed the binary-search-utils branch 2 times, most recently from 5679aa0 to 4b7e881 Compare May 13, 2026 08:20
@IvoDD IvoDD force-pushed the sparse-resampling-support branch 3 times, most recently from 210a17b to 086284c Compare May 13, 2026 14:53
@IvoDD IvoDD mentioned this pull request May 14, 2026
5 tasks
@IvoDD IvoDD force-pushed the sparse-resampling-support branch from 086284c to 5e4edb7 Compare May 14, 2026 12:10
@IvoDD IvoDD added the patch Small change, should increase patch version label May 14, 2026
@IvoDD IvoDD changed the title [Draft] Sparse resampling support Resampling performance improvement and sparse aggragation columns support May 14, 2026
@IvoDD IvoDD changed the base branch from binary-search-utils to binary-search-utils-optimization May 14, 2026 13:12
@IvoDD IvoDD marked this pull request as ready for review May 14, 2026 13:47
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 14, 2026

ArcticDB Code Review Summary

Delta since last review is two new commits on the rebased branch (8d9a8ff "Sparse resample hypothesis test", 25f608d "Fix resampling sparse docs"). The 54-file BEFORE..AFTER range is dominated by rebase noise from master; the genuinely new PR content touches only 4 files (_store.py, library.py, processing.py, test_arrow_sparse.py). All changes are correct and low-risk.

Documentation

  • The obsolete "resampling does not yet support sparse data" warnings in compact_incomplete/compact docstrings (_store.py, library.py) and the QueryBuilder.resample/agg docstrings (processing.py) have been removed, aligning user-facing docs with the new sparse-aggregation support. This resolves the previously-flagged documentation gap.
  • The Claude-maintained technical docs (docs/claude/cpp/PROCESSING.md, docs/claude/python/QUERY_PROCESSING.md) do not contain contradictory statements, so no further doc action is required for this delta.

Tests

  • New test_sparse_polars_resample_hypothesis exercises sparse float resampling across all aggregation ops against polars - good coverage for the new feature.
  • _polars_agg_expr correctly hoisted from a TestSparseArrowResample staticmethod to module scope; all call sites updated (no stale self._polars_agg_expr references remain).

Notes (no action required)

  • This PR is stacked and was rebased onto a newer master (which now contains the compact_data/arithmetic-promotion/pow changes that appear as noise in the raw delta) - merge order still matters.

@IvoDD IvoDD changed the title Resampling performance improvement and sparse aggragation columns support Resampling performance improvement and sparse aggregation columns support May 14, 2026
Comment thread cpp/arcticdb/processing/test/test_resample.cpp
Comment thread cpp/arcticdb/processing/sorted_aggregation.cpp Outdated
Comment thread cpp/arcticdb/processing/sorted_aggregation.cpp Outdated
Comment thread cpp/arcticdb/processing/sorted_aggregation.cpp
Comment thread cpp/arcticdb/processing/sorted_aggregation.cpp
@IvoDD IvoDD force-pushed the binary-search-utils-optimization branch from 0c2d98c to 6120021 Compare May 21, 2026 09:18
IvoDD added a commit that referenced this pull request May 21, 2026
#### Reference Issues/PRs
Optimizations on top of #3091
Used in #3062 

#### What does this implement or fix?
Some micro optimizations on binary search methods:
- Don't keep `TypedBlockData` in `ColumnDataIterator`. Instead only keep
`block_data_` and `block_size_`
- Don't recalculate block pointer and size when we already know them
during gallop

#### Any other comments?
Benchmarks for all search and iteration methods:

  | Benchmark | Before (ns) | After (ns) | Delta |
  |---|---:|---:|---:|
| iterate_irregular_blocks_1 (one row per block) | 478,496 | 311,163 |
−35.0% |
  | iterate_with_iterator (100 rows) | 798 | 719 | −9.9% |
  | exponential_lb_single_block (in first 100) | 356 | 323 | −9.2% |
  | exponential_lb_single_block (full gallop) | 458 | 424 | −7.4% |
  | exponential_lb_regular (in first 100) | 364 | 339 | −6.7% |
  | exponential_lb_irregular_1000 (in first 100) | 360 | 335 | −6.7% |
  | exponential_lb_irregular_1000 (full gallop) | 496 | 476 | −3.9% |
  | exponential_lb_regular (full gallop) | 504 | 489 | −2.9% |
  | exponential_lb_irregular_1 (in first 100) | 464 | 455 | −2.0% |
  | exponential_lb_irregular_1 (full gallop) | 687 | 679 | −1.3% |
  | lower_bound_single_block | 411 | 394 | −4.1% |
  | lower_bound_irregular_1000 | 444 | 431 | −3.0% |
  | lower_bound_irregular_1 | 595 | 579 | −2.8% |
  | lower_bound_regular_blocks | 443 | 436 | −1.4% |
  | iterate_single_block | 27,305 | 27,247 | −0.2% |
  | iterate_regular_blocks | 29,051 | 28,734 | −1.1% |
  | iterate_irregular_blocks_1000 | 28,136 | 27,893 | −0.9% |
| iterate_with_scalar_at (100 rows) | 182,183,122 | 182,088,026 | −0.1%
|

#### Checklist

<details>
  <summary>
   Checklist for code changes...
  </summary>
 
- [ ] Have you updated the relevant docstrings, documentation and
copyright notice?
- [ ] Is this contribution tested against [all ArcticDB's
features](../docs/mkdocs/docs/technical/contributing.md)?
- [ ] Do all exceptions introduced raise appropriate [error
messages](https://docs.arcticdb.io/error_messages/)?
 - [ ] Are API changes highlighted in the PR description?
- [ ] Is the PR labelled as enhancement or bug so it appears in
autogenerated release notes?
</details>

<!--
Thanks for contributing a Pull Request to ArcticDB! Please ensure you
have taken a look at:
- ArcticDB's Code of Conduct:
https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md
- ArcticDB's Contribution Licensing:
https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing
-->

Co-authored-by: Ivo <ivo.dilov@man.com>
Base automatically changed from binary-search-utils-optimization to master May 21, 2026 12:06
@IvoDD IvoDD force-pushed the sparse-resampling-support branch from 5e4edb7 to 89d9fd8 Compare May 21, 2026 14:20
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we can get hypothesis tests covering some basic scenarios against polars, no need to test all supported parameters as some are quite painful to test.

size_t{0},
[](size_t acc, const auto& col) { return acc + col->row_count(); }
);
const auto max_output_rows = std::min(bucket_boundaries.size() - 1, total_input_rows);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can you end up with bucket_boundaries.size() - 1 > total_input_rows

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have loads of empty buckets. E.g. use resample("1h") on a table which has a 24h frequency like 2026-01-01, 2026-01-02, 2026-01-03

Ivo added 10 commits June 3, 2026 15:07
Previously each of `generate_output_index_column`,
`generate_resample_output_column` and `aggregate` had complicated logic
to identify which row corresponds to which output column.

This is simplified by creating a `ResampleMapping` when building the
output index column to store which output row corresponds to which input
values. Then `ResampleMapping` is used in the other methods.
A lot of resampling runtime was spent during generation of output index
column. This can be sped up significantly in the common case where
number of buckets is much smaller then input rows by using exponential
binary search.
Helps speed up and decrease memory usage for the very rare case where num_buckets >> num_input_rows.
With benchmarking of various rows_per_bucket it was confirmed that
exponential_search becomes faster than linear scan at around 32 elements.

For <32 rows per bucket the linear pass is faster. For >32 the
exponential search is faster.
Construct output agg column based on rs_index of input sparse columns.

Then use sparse iterators to populate the values.
@IvoDD IvoDD force-pushed the sparse-resampling-support branch from 89d9fd8 to 25f608d Compare June 3, 2026 14:11
@IvoDD
Copy link
Copy Markdown
Collaborator Author

IvoDD commented Jun 4, 2026

The time_compact_data benchmark failure is unrelated. ColumnDataIterator changes were from a previous PR.

@IvoDD IvoDD merged commit 4361105 into master Jun 4, 2026
225 of 226 checks passed
@IvoDD IvoDD deleted the sparse-resampling-support branch June 4, 2026 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

patch Small change, should increase patch version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants